{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The AI behind Seeing AI\n",
    "\n",
    "The Seeing AI app (available on the iOS App Store) provides tools for blind and vision-impaired people to assist with everyday tasks. Many of the functions in the app are implemented with [Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/?WT.mc_id=ODSC-workshop-davidsmi) in Azure.\n",
    "\n",
    "In this workshop, we'll experiment with various [Cognitive Service APIs](https://azure.microsoft.com/en-us/services/cognitive-services/directory/vision/?WT.mc_id=ODSC-workshop-davidsmi) used in the Seeing AI app to see how they behave and learn about the type of data they return. We will be using only the web demo interfaces, so you won't need an Azure subscription and you'll only need a web browser to try them out.\n",
    "\n",
    "No programming will be required in this part of the workshop. If you'd like to learn how to use Cognitive Services from application via the API or language SDKs, try this Microsoft Learn module: [Process images with the Computer Vision service](https://docs.microsoft.com/en-us/learn/modules/create-computer-vision-service-to-classify-images/index?WT.mc_id=ODSC-workshop-davidsmi)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional: Install Seeing AI\n",
    "\n",
    "If you have an iPhone or iPad, search for \"Seeing AI\" in the App Store to install it. (Seeing AI is not available for Android.) The app is free."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Finding data\n",
    "\n",
    "The [Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/directory/vision/?WT.mc_id=AILive-workshop-davidsmi) demo pages typically include sample data (photos, etc) you can try, but we encourage you to find your own data and see what happens. You can use a file on your laptop or the URL of a file on the Web. Here are some things you can try:\n",
    "\n",
    "* [Random Wikimedia image page](https://commons.wikimedia.org/w/index.php?title=Special:Random&prop=images). Click the download button to get a direct URL (often with a choice of resolutions).\n",
    "* [ImageNet](http://www.image-net.org/). Type a word into the search box to find images by category.\n",
    "* [Google Images](https://images.google.com/). It can be tricky to get a link to an image: click on a result and right-click on a large image (_not_ a thumbnail) to get a URL.\n",
    "* Your own media files. Unless they are accessible by URL, you'll need to download them to your laptop first (in a suitable format/size) and use the \"Browse\" button on the demo page to upload them. Microsoft does not store media submitted on these pages."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Computer Vision \n",
    "\n",
    "The [Computer Vision services](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/home?WT.mc_id=AILive-workshop-davidsmi) provide APIs related to extracting information from images: people and faces, objects, text and more. \n",
    "\n",
    "## \"Person\" feature in Seeing AI\n",
    "\n",
    "In the Seeing AI app, the \"Person\" feature detects people in the camera frame in real time, and describes them when pressing the camera button. Try it in the app now.\n",
    "\n",
    "* Click the \"Person\" button in the Seeing AI app\n",
    "* Point the camera at a face; an audio guide will alert you when a face is in frame. (There's a button to switch cameras for a selfie.)\n",
    "* Tap the screen to get detailed information about the faces on the screen: apparent age and gender, description, expression.\n",
    "\n",
    "## Face API\n",
    "\n",
    "The [Face API](https://azure.microsoft.com/en-us/services/cognitive-services/face/?WT.mc_id=AILive-workshop-davidsmi#detection) will detect one or more human faces in an image, and return information about the location and attribute of those faces.\n",
    "\n",
    "* Visit the [Face Detection](https://azure.microsoft.com/en-us/services/cognitive-services/face/?WT.mc_id=AILive-workshop-davidsmi#detection) page.\n",
    "* Copy the URL of an image to the \"Image URL\" box, or click \"Browse\" to upload an image, then click Submit.\n",
    "* Your image will be shown on the page with a rectangle bounding all detected faces.\n",
    "* Look at the results in the right pane to see the detailed analysis. The JSON array will include one element for each detected face.\n",
    "* The `faceRectangle` property defines the bounding box of the face(s).\n",
    "* The `faceAttributes` propery includes details about the person's appearance: the person's apparent gender and age, and other visible attributes such as hair color/baldness, type of facial hair, presence of glasses and headwear, use of makeup, etc.\n",
    "* The `faceLandmarks` property locates specific parts of the face (pupils, tip of the nose, corner of the mouth, etc) within the image\n",
    "* Check the [FACE API Reference](https://docs.microsoft.com/en-us/azure/cognitive-services/face/apireference?WT.mc_id=AILive-workshop-davidsmi) for other attributes that may be of interest.\n",
    "\n",
    "Things to try with [Face Detection](https://azure.microsoft.com/en-us/services/cognitive-services/face/?WT.mc_id=AILive-workshop-davidsmi#detection): \n",
    "\n",
    "* [Sikh man](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Sikh_man%2C_Agra_10.jpg/648px-Sikh_man%2C_Agra_10.jpg)\n",
    "* [Woman in comic store](https://upload.wikimedia.org/wikipedia/commons/e/e7/7.9.10OliviaMunnByLuigiNovi53.jpg)\n",
    "* [Contemptuous man](https://upload.wikimedia.org/wikipedia/commons/thumb/4/40/Man_wearing_red_sunglasses.jpg/1024px-Man_wearing_red_sunglasses.jpg)\n",
    "\n",
    "## Emotion Recognition\n",
    "\n",
    "The Face API returns information about the subject's apparent emotion, based on the facial expression. That information is included in the `faceAttributes` property also includes  and intensity of emotion (anger, happiness, surprise, etc) on a scale from 0-1.\n",
    "\n",
    "You can try this out in the [Emotion Recognition demo](https://azure.microsoft.com/en-us/services/cognitive-services/face/?WT.mc_id=AILive-workshop-davidsmi#recognition) app.\n",
    "\n",
    "Things to try with [Emotion Recognition](https://azure.microsoft.com/en-us/services/cognitive-services/face/?WT.mc_id=AILive-workshop-davidsmi#recognition):\n",
    "\n",
    "* [Happy man](https://upload.wikimedia.org/wikipedia/commons/4/4b/Sven.Littkowski.2011.JPG)\n",
    "* [Neutral boy](https://upload.wikimedia.org/wikipedia/commons/2/20/Bangalore%2C_India_%28844327901%29.jpg)\n",
    "* [Grimacing woman](https://upload.wikimedia.org/wikipedia/commons/e/e1/Grimace_2.jpg)\n",
    "* [Angry woman](https://upload.wikimedia.org/wikipedia/commons/0/08/Angry_woman.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Documents\n",
    "\n",
    "The Seeing AI app provides several tools to read written text aloud:\n",
    "\n",
    "* **Short Text** speaks aloud text as it becomes visible in the camera. It's intended for recognizing simple text, like signposts and labels.\n",
    "* **Document** is intended to read the text from a page. Audio cues help the user bring the entire page into frame, and then the entire document is scanned read aloud.\n",
    "* **Handwriting** (beta) analyzes a photo of a handwritten document and reads the text aloud. \"Document\" can also read some handwriting, but this works better with handwritten notes (and worse with printed documents).\n",
    "\n",
    "## Extract Printed Text\n",
    "\n",
    "The [OCR Service](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/?WT.mc_id=ODSC-workshop-davidsmi#text) detects text within images, and returns the extracted text along with the rectangle locating the text within the image. It can detect text in 25 languages, even if it's at an angle with respect to the page. [Detailed documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-extracting-text-ocr?WT.mc_id=AILive-workshop-davidsmi) here. \n",
    "\n",
    "Things to try with [OCR Service](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/#text): \n",
    "\n",
    "* [Business Card](https://upload.wikimedia.org/wikipedia/commons/e/e0/Business_Card.jpg)\n",
    "* [letter of gratitude](https://upload.wikimedia.org/wikipedia/commons/b/bd/Ukranian_contemporary_architecture_and_desing_as_part_of_world_architecture_day_in_Rigga.jpg)\n",
    "* [White House menu](https://upload.wikimedia.org/wikipedia/commons/3/32/State_Dinner_Menu_King_Hussein_of_Jordan.jpg).\n",
    "\n",
    "## Handwriting\n",
    "\n",
    "The \"Handwriting\" tool in Seeing AI can be used to read the text in a handwritten note.\n",
    "\n",
    "The [Handwriting Recognition](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/?WT.mc_id=AILive-workshop-davidsmi#handwriting) API performs the same task for a picture of handrwiting.\n",
    "\n",
    "Things to try with [Handwriting Recognition](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/?WT.mc_id=ODSC-workshop-davidsmi#handwriting):\n",
    "\n",
    "* [Analytics notepad](https://upload.wikimedia.org/wikipedia/commons/b/bd/Handwritten_Text.jpg)\n",
    "* [Letter to Lt. Boomer](https://upload.wikimedia.org/wikipedia/commons/6/60/Letter_addressed_to_Lt._General_Boomer%2C_30_May_1991_%285840049586%29.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scene Description\n",
    "\n",
    "The \"Scene\" feature in Seeing AI will provide a spoken description of an image. Point the camera at an interesting scene and tap the screen to generate a description (which will also be spoken aloud). \n",
    "\n",
    "You can also use the \"hamburger\" menu to browse photos on your iPhone.\n",
    "\n",
    "## Image Analysis\n",
    "\n",
    "Provide an image to the [Analyze an image](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/?WT.mc_id=AILive-workshop-davidsmi#analyze) tool, and the API returns\n",
    "\n",
    "* A natural-language description of the scene (with associated confidence score, from 0%-100%)\n",
    "* A list of objects detected in the scene (with associated confidence for each)\n",
    "* Attributes of the image (black and white? line drawing? clip art?)\n",
    "* Detection of \"racy\" and \"adult\" content (with associated scores)\n",
    "* Detected faces (location and apparent age/gender -- use Face API for more details)\n",
    "* Color analysis (dominant foreground, background and accent colors)\n",
    "\n",
    "Things to try with [Analyze an image](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/?WT.mc_id=AILive-workshop-davidsmi#analyze):\n",
    "\n",
    "* [Scarborough Harbour](https://upload.wikimedia.org/wikipedia/commons/4/4c/Scarborough_MMB_24_Harbour.jpg): \"a harbor filled with lots of small boats in a body of water\"\n",
    "* [Husky](https://upload.wikimedia.org/wikipedia/commons/8/80/Siberian_husky1.JPG): \"a dog sitting in front of a fence\"\n",
    "* [Taj Mahal](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a3/Taj_Mahal-11.jpg/1024px-Taj_Mahal-11.jpg): \"a group of people walking in front of Taj Mahal\"\n",
    "* [Pyramids](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Egypt-Archaeologists.jpg/2048px-Egypt-Archaeologists.jpg): \"a group of people on a beach\" (confidence 66%)\n",
    "\n",
    "We'll look at generating scene descriptions via the Computer Vision API using R in the next section."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R [conda env:anaconda3]",
   "language": "R",
   "name": "conda-env-anaconda3-r"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
